This report is a compilation of my R notebooks in which I performed several analysis. As such, it will read like a lab notebook. I hope it will explain my analyses and my thinking and provide enough information for the next person to pick up this project. Please check out the source and project files for orographic_precip on my Google Drive.
Observations of the influence of topography on precipitation patterns exist throughout the world, including in the European Alps, Andes, Sierra Nevada in California, Olympic Mountains in Washington State, and Himalaya (Alpert 1986; Reiners et al. 2003; Anders et al. 2006; Anders et al. 2007). There is an undeniable relationship between topography and the spatial pattern of precipitation (Anders et al. 2006), but the direct link between precipitation and erosion is unclear (Reiners et al. 2003; Molnar 2003; D. W. Burbank et al. 2003; Dadson et al. 2003). Landscape evolution modeling enables the simulation of long-term topographic change by balancing the competing processes of uplift and erosion. Several studies have suggested the importance of spatially-varying precipitation for predicting landscape evolution (Anders et al. 2007; Han, Gasparini, and Johnson 2015). A recent advance in modeling precipitation in complex terrain includes the Linear Theory of Orographic Precipitation (LT model) (Smith and Barstad 2004) which linearizes the complex terrain using a Fourier Transform and enables rapid simulation of the processes driving orographically-enhanced precipitation.
A common signature of orographically-enhanced precipitation is a wet side and dry side of a mountain range (Figure 1). As a moist airmass is pushed up the windward side of the mountain, the air mass cools and water vapor condenses to water droplets. The water droplets precipitate preferentially on the windward side. The dry air mass is pushed to the leeward side, but little to no moisture is available to precipitate.
Figure 1. Rainshadow effect. Source: https://commons.wikimedia.org/wiki/File:Rainshadow_copy.jpg
I evaluated modeled precipitation in the tropical and extratropical Andes, and South Africa (Figure 2).
Figure 2. Global Mean Precipitation as estimated from the Tropical Rainfall Measurement Mission (TRMM)
I used the Tropical Rainfall Measurement Mission (TRMM) as the basis for the analysis. TRMM is a satellite that was launched in 1997 with 5 instruments to measure rainfall in tropical areas. Operation ended in 2015 see NASA TRMM webpage.
I used the 2B31 product as published by Bodo Bookhagen and availabe here (see Figure 2). The 2B31 data product is a combined Precipitation Radar (PR) / TRMM Microwave Imager (TMI) rainfall intensity product that Bodo bias-corrected using precipitation gauges. This product is available at 0.04° resolution and was reprojected to a South American Albers projection at 5km resolution. See Bookhagen and Burbank (2010) for product details.
A second TRMM product used is the 3B42 which is 3-hourly precipitation rate available at 0.25° resolution from 1998-2015. I processed this on Google Earth Engine to estimate the number of hours of rainfall per year. Each 3 hour period was classified as “raining” or “not raining” and the hours of rainfall were summed per year. The temporal median of each pixel was computed for this analysis (Figure 3).
Figure 3. Global median hours of precipitation as estimated from TRMM
The median hours was reprojected to the same grid as the 2B31 product using bilinear resampling.
The Global Multi-resolution Terrain Elevation Data 2010 (GMTED2010) digital elevation model was downloaded and cropped for the different domains. The median Statistic, 30 arc-seconds DEM was reprojected to the same 5km grid as the TRMM products using bilinear resampling.
The Linear Theory of Orographic Precipitation (LT model) is used to simulate precipitation in mountainous topography based on a DEM (Smith and Barstad (2004)). The LT model computes precipitaton intensity by estimating the condensation and evaporation processes in a saturated air mass as it rises and falls in elevation. The atmosphere is assumed to have a constant temperature profile and windspeed is constant in space and time both horizontally and verically. Water vapor condenses as the air mass rises and precipitation falls out after some delay. Time delays are used to approximate the effect of the cloud physics. As a cloud decreases in elevation, remaining water droplets may evaporate before they can fall out as precipitation.
I computed hourly precipitation rates for different mountain ranges around the world using the R package LinearTheoryOrographicPrecip available on github. I assumed a 45° wind direction for tropical and extratropical Andes and 135° wind direction for southern Africa. The remaining parameter values for the model varied based on the analysis and can be seen below. To address the first question, I used the TRMM-derived hours of precipitation per year to scale the rainfall intensity (mm/hr) to mm/yr. To address the second question I considered how to scale the rainfall intensity without the use of remote sensing.
I used the Nash-Sutcliffe (NS) coefficient to test the skill of a simulation. The NS coefficient compares the squared errors from the model with the squared error from the mean, i.e. does the model produce a result that is better than the mean of the observations? Values between 0 and 1 indicate the model is better than the mean. A value of 0 indicates the model performs the same as the mean. Values <0 indicate the mean more accurately presents the observations.
\[ NS = 1-\frac{\sum{(y_{model}-y_{obs})^2}}{\sum{(y_{model}-mean(y_{obs}))^2}} \]
What follows are six analyses that I performed to address the two science questions. Each of the analyses run independently, but the ideas build upon the former analysis. Each analysis starts with an Introduction explaining the purpose of the analysis, a Method section detailing any methods specific to the analysis, an Outcome section for a summary of results, and finally the Details of the analysis.
Which parameters of the LT model are the most sensitive?
Do the best parameters produce modeled precip that matches that from TRMM?
We only evaluate a subset of parameters including:
In general, I found the model wasn’t that sensitive to any of these parameters when compared against TRMM precip (and scaled using TRMM precip hours), and the resulting distribution of precipitation looks quite reasonable when compared against observations. Background precip was perhaps the most important. Final parameter values for other experiments were chosen to be:
background precip = 2 mm/hr
time delay(s) = 800 s (this value is partially inspired by Anders et al. 2008, Fig 1)
wind speed = 5 m/s
truncation value = 0 mm/hr
In my opinion, the model does remarkably well at capturing the overall pattern of precipitation. There are differences between the observed and modeled maps (see below) and the distributions of precip (see below), but given that the model doesn’t use any observations I was surprised.
t_delay = 800
trunc_value = 0
#winddir defined with location
windspeed = 5
bckgrd_precip=2
# Define your location. Most locations these strings will be the same but for South America, I use the same DEM (elev_SAmerica_albers.tif) for the tropical and extratropical andes despite optimizing separately.
# -- you'll need to decide on a wind direction for your loaction. W/SW for Andes. E for Africa. SW for himalaya
# -- you'll need to setup a place to save intermediate FFT so it doesn't need to be regenerated each iteration
elevation_string='SAmerica'
location_string='tropicalAndes'
winddir = 45
#
Intdir <- file.path(RPROJHOME,'int')
if(!dir.exists(Intdir)) dir.create(Intdir)
Intfn <- file.path(Intdir,paste0('fft_',location_string,'.rds'))
#This function processes DEM and TRMM data to the same grid for your desired location. Variable assignments are made from the function to the global environment with <<-
suppressWarnings(rm(elev,trmm_hours_mtn,trmm_precip_mtn,mtn_mask))
preprocess_rasters(elev_string=elevation_string,location_string=location_string)
## Warning in sqrt(-1 * m_sqr): NaNs produced
precipdF %>%
# gather(source,precip,trmm:LT) %>%
ggplot(aes(trmm,LT))+
geom_point()+
geom_abline(col='red')+
geom_density2d()+
coord_cartesian(ylim=c(0,4000),xlim=c(0,4000))
precipdF %>%
gather(source,precip,trmm:LT) %>%
ggplot(aes(precip))+
geom_density(aes(color=source)) +
coord_cartesian(xlim=c(0,4000))
precipdF %>%
mutate(error=LT-trmm) %>%
ggplot()+
geom_point(aes(x=trmm,y=error))
## Warning: Removed 82588 rows containing missing values (geom_point).
This parameter grid search takes a while so I ran everything and saved the results. Nash-sutcliffe was used to compare model and observations. Here is an example of how the experiment was setup and the top results.
# Define your location. Most locations these strings will be the same but for South America, I use the same DEM (elev_SAmerica_albers.tif) for the tropical and extratropical andes despite optimizing separately.
# -- you'll need to decide on a wind direction for your loaction. W/SW for Andes. E for Africa. SW for himalaya
# -- you'll need to setup a place to save intermediate FFT so it doesn't need to be regenerated each iteration
elev_string='SAmerica'
location_string='extratropicalAndes'
winddir=45
#
Intdir <- file.path(RPROJHOME,'int')
if(!dir.exists(Intdir)) dir.create(Intdir)
Intfn <- file.path(Intdir,paste0('fft_',location_string,'.rds'))
#This function processes DEM and TRMM data to the same grid for your desired location. Variable assignments are made from the function to the global environment
suppressWarnings(rm(elev,trmm_hours_mtn,trmm_precip_mtn,mtn_mask))
preprocess_rasters(elev_string,location_string)
skill_fn <- file.path(RPROJHOME,paste0('output/LT_parameter_optimization_',location_string,'_winddir-',winddir,'.rds'))
if(file.exists(skill_fn)){
skilldF <- readRDS(skill_fn)
} else {
paramdf <- cross_df(.l=list(bckgrd_precip=seq(1,3,.5),t_delay=seq(700,1500,100),windspeed=seq(5,5,5),winddir=winddir,trunc_value=seq(-2,0,0.5)))
skilldF <-
paramdf %>%
mutate(
skill=pmap_dbl(.,LToptimize,elev,trmm_hours_mtn,trmm_precip_mtn,mtn_mask,NS,Intfn)
)
saveRDS(skilldF,skill_fn)
}
skilldF %>%
arrange(desc(skill))
## # A tibble: 225 x 6
## bckgrd_precip t_delay windspeed winddir trunc_value skill
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2 1200 5 45 -2.0 0.4051471
## 2 2 1200 5 45 -1.5 0.4051471
## 3 2 1200 5 45 -1.0 0.4051471
## 4 2 1200 5 45 -0.5 0.4051471
## 5 2 1300 5 45 -2.0 0.4049005
## 6 2 1300 5 45 -1.5 0.4049005
## 7 2 1300 5 45 -1.0 0.4049005
## 8 2 1300 5 45 -0.5 0.4049005
## 9 2 1100 5 45 -2.0 0.4042259
## 10 2 1100 5 45 -1.5 0.4042259
## # ... with 215 more rows
The results are presented as pairplots. The rows and columns are the 4 parameters identified above, such that the x and y axis correspond to the values of the parameter indicated by the grey box. The main interest will be the raster images in the bottom left. The color represenst the skill score. Positive values indicate the result was better than the mean of the observations. The diagonal plots were an experiment to plot the actual skill scores agains the parameter values- they are hard to interpret so I would ignore them.
Just for fun, here are the top performing models.
| region | bckgrd_precip | t_delay | windspeed | winddir | trunc_value | skill |
|---|---|---|---|---|---|---|
| tropicalAndes | 1.5 | 1500 | 5 | 45 | -2.0 | 0.6055060 |
| tropicalAndes | 1.5 | 1500 | 5 | 45 | -1.5 | 0.6055060 |
| tropicalAndes | 1.5 | 1500 | 5 | 45 | -1.0 | 0.6055060 |
| tropicalAndes | 1.5 | 1500 | 5 | 45 | -0.5 | 0.6055060 |
| tropicalAndes | 2.0 | 1500 | 5 | 45 | -2.0 | 0.6021763 |
| tropicalAndes | 2.0 | 1500 | 5 | 45 | -1.5 | 0.6021763 |
| tropicalAndes | 2.0 | 1500 | 5 | 45 | -1.0 | 0.6021763 |
| tropicalAndes | 2.0 | 1500 | 5 | 45 | -0.5 | 0.6021763 |
| tropicalAndes | 1.5 | 1400 | 5 | 45 | -2.0 | 0.5971641 |
| tropicalAndes | 1.5 | 1400 | 5 | 45 | -1.5 | 0.5971641 |
| tropicalAndes | 1.5 | 1400 | 5 | 45 | -1.0 | 0.5971641 |
| tropicalAndes | 1.5 | 1400 | 5 | 45 | -0.5 | 0.5971641 |
| tropicalAndes | 2.0 | 1400 | 5 | 45 | -2.0 | 0.5965794 |
| tropicalAndes | 2.0 | 1400 | 5 | 45 | -1.5 | 0.5965794 |
| tropicalAndes | 2.0 | 1400 | 5 | 45 | -1.0 | 0.5965794 |
| tropicalAndes | 2.0 | 1400 | 5 | 45 | -0.5 | 0.5965794 |
| tropicalAndes | 2.0 | 1300 | 5 | 45 | -2.0 | 0.5896795 |
| tropicalAndes | 2.0 | 1300 | 5 | 45 | -1.5 | 0.5896795 |
| tropicalAndes | 2.0 | 1300 | 5 | 45 | -1.0 | 0.5896795 |
| tropicalAndes | 2.0 | 1300 | 5 | 45 | -0.5 | 0.5896795 |
| southafrica | 1.5 | 800 | 5 | 135 | 0.0 | 0.4425294 |
| southafrica | 1.5 | 900 | 5 | 135 | 0.0 | 0.4419318 |
| southafrica | 1.5 | 1000 | 5 | 135 | 0.0 | 0.4412918 |
| southafrica | 1.5 | 1100 | 5 | 135 | 0.0 | 0.4406361 |
| southafrica | 1.5 | 1200 | 5 | 135 | 0.0 | 0.4399860 |
| southafrica | 1.5 | 1300 | 5 | 135 | 0.0 | 0.4393507 |
| southafrica | 1.5 | 1400 | 5 | 135 | 0.0 | 0.4387423 |
| southafrica | 1.5 | 1500 | 5 | 135 | 0.0 | 0.4381648 |
| extratropicalAndes | 2.0 | 1200 | 5 | 45 | -2.0 | 0.4051471 |
| extratropicalAndes | 2.0 | 1200 | 5 | 45 | -1.5 | 0.4051471 |
| extratropicalAndes | 2.0 | 1200 | 5 | 45 | -1.0 | 0.4051471 |
| extratropicalAndes | 2.0 | 1200 | 5 | 45 | -0.5 | 0.4051471 |
| extratropicalAndes | 2.0 | 1300 | 5 | 45 | -2.0 | 0.4049005 |
| extratropicalAndes | 2.0 | 1300 | 5 | 45 | -1.5 | 0.4049005 |
| extratropicalAndes | 2.0 | 1300 | 5 | 45 | -1.0 | 0.4049005 |
| extratropicalAndes | 2.0 | 1300 | 5 | 45 | -0.5 | 0.4049005 |
| extratropicalAndes | 2.0 | 1100 | 5 | 45 | -2.0 | 0.4042259 |
| extratropicalAndes | 2.0 | 1100 | 5 | 45 | -1.5 | 0.4042259 |
| extratropicalAndes | 2.0 | 1100 | 5 | 45 | -1.0 | 0.4042259 |
| extratropicalAndes | 2.0 | 1100 | 5 | 45 | -0.5 | 0.4042259 |
| extratropicalAndes | 2.0 | 1400 | 5 | 45 | -2.0 | 0.4037974 |
| extratropicalAndes | 2.0 | 1400 | 5 | 45 | -1.5 | 0.4037974 |
| extratropicalAndes | 2.0 | 1400 | 5 | 45 | -1.0 | 0.4037974 |
| extratropicalAndes | 2.0 | 1400 | 5 | 45 | -0.5 | 0.4037974 |
| extratropicalAndes | 2.0 | 1500 | 5 | 45 | -2.0 | 0.4020013 |
| extratropicalAndes | 2.0 | 1500 | 5 | 45 | -1.5 | 0.4020013 |
| extratropicalAndes | 2.0 | 1500 | 5 | 45 | -1.0 | 0.4020013 |
| extratropicalAndes | 2.0 | 1500 | 5 | 45 | -0.5 | 0.4020013 |
| southafrica | 1.5 | 1500 | 5 | 135 | -3.0 | 0.3945932 |
| southafrica | 1.5 | 1500 | 5 | 135 | -2.5 | 0.3945932 |
| southafrica | 1.5 | 1500 | 5 | 135 | -2.0 | 0.3945932 |
| southafrica | 1.5 | 1500 | 5 | 135 | -1.5 | 0.3945932 |
| southafrica | 1.5 | 1500 | 5 | 135 | -1.0 | 0.3945932 |
| southafrica | 1.5 | 1500 | 5 | 135 | -0.5 | 0.3945932 |
| southafrica | 1.5 | 1400 | 5 | 135 | -3.0 | 0.3936591 |
| southafrica | 1.5 | 1400 | 5 | 135 | -2.5 | 0.3936591 |
| southafrica | 1.5 | 1400 | 5 | 135 | -2.0 | 0.3936591 |
| southafrica | 1.5 | 1400 | 5 | 135 | -1.5 | 0.3936591 |
| southafrica | 1.5 | 1400 | 5 | 135 | -1.0 | 0.3936591 |
| southafrica | 1.5 | 1400 | 5 | 135 | -0.5 | 0.3936591 |
| southafrica | 1.5 | 1300 | 5 | 135 | -3.0 | 0.3925315 |
| southafrica | 1.5 | 1300 | 5 | 135 | -2.5 | 0.3925315 |
| southafrica | 1.5 | 1300 | 5 | 135 | -2.0 | 0.3925315 |
| southafrica | 1.5 | 1300 | 5 | 135 | -1.5 | 0.3925315 |
| southafrica | 1.5 | 1300 | 5 | 135 | -1.0 | 0.3925315 |
| southafrica | 1.5 | 1300 | 5 | 135 | -0.5 | 0.3925315 |
This analyis was motivated by the desire to include observed (or estimated based on some knowledge of climate) precipitation variability.
If we know mean and variability of precipitation, how can we parameterize background precipitation in LTmodel?
We compute stochastic precipitation timeseries for the year. We assume gamma distributed precip amount around some mean and the storm events are exponentially distributed. The precipitation function is based on the stochastic precip generator in Landlab
The mean from the stochasitc precipitation timeseries is used for the background precipitation in the LT model.
It is possible to match the mean observed yearly precipitation using a background precipitation stochastically generated. I did not look at any data to determine what reasonable even duration, interevent duration or storm depth should be, but the values shown below reproduce observed precip reasonably well.
First, define some mean attributes of our storms:
#This is a very rainy place!
avg_event_duration = 48#hours
avg_interevent_duration = 36#hours
avg_storm_depth = 50 #mm
total_t=365*24 #total duration of timeseries in hours
Estimate the timeseries and plot the result:
ggplot(precip_timeseries)+
geom_col(aes(x=storm_start,y=intensity))+
labs(x='storm event [hrs since Jan 1 00:00]',
y='intensity [mm/hr]')
Here are the precipitation statistics for the year:
We need to define some parameters for the LT model. We will use annual intensity from the stochastic storms as the background precipitation.
location_string='tropicalAndes'
elev_extratrop <- raster(file.path(RPROJHOME,paste0('data/elev/SAmerica/elev_',location_string,'_albers.tif')))
winddir = 45
bckgrd_precip = annual_precip_stats$annual_intensity
t_delay = 800
trunc_value = 0
#winddir defined above with location
windspeed = 5
We’ll scale the LT model output by the number of observed hours of precip from TRMM. Then we can compare the mm/yr from TRMM to our modeled precip distribution.
Here are the hours of precipitation as observed by TRMM:
Here are the hours of precip observed by TRMM cropped to the tropical Andes:
mtn_ext=raster(file.path(RPROJHOME,paste0('gis/mtnmask_tropicalAndes.tif')))
mtn_ext <- projectRaster(mtn_ext,trmm_hours) %>% trim()
trmm_hours_mtn <- crop(trmm_hours,mtn_ext)
data(World)
plot(trmm_hours_mtn)
world_mtn <- spTransform(World[World$continent=='South America',],projection(trmm_hours_mtn))
crop(world_mtn,trmm_hours_mtn) %>% plot(add=T)
# MASS::fitdistr(raster::getValues(trmm_hours_mtn),'gamma')
To estimate the annual precip based on the LT model, I multiply the LT result by annual observed stormhours.
LTp_annual <- LTprecip * trmm_hours
Here you can see a larger extent than in the previous analysis and broadly speaking the results from the flatter areas exhibit the same trend as the observations. Higher precip is observed in the north trending to lower precip in the east.
Here we calculated the NS coefficient between the modeled and observed precip for the larger area. The skill is positive which means the distribution is better than the mean (barely). We saw previously that a background precipitation of 2 mm/hr worked well in most cases so unless you have good reason to produce precipitation stochastically it’s probably not worth it.
## [1] 0.1766396
Nonetheless, the spatial mean precip agree quite well between the observation and modeled precip.
The mean precip observed by TRMM is 1212.493716.
The mean precip from the LT model is 1120.7422351.
We can zoom into the mountains region to compare the distributions visually. We see good agreement in the location of the dark bands (indicating high precip), and these light on the windward side of the ridges.
Next we can check the errors between the modeled precip and observed precip.
A scatter plot of modeled vs observed shows there to be some deviation at low observations but generally the model and observations follow the 1:1 line. The blue lines show the density of points.
precipdF %>%
# gather(source,precip,trmm:LT) %>%
ggplot(aes(trmm,LT))+
geom_point()+
geom_abline(col='red')+
geom_density2d()+
coord_cartesian(ylim=c(0,4000),xlim=c(0,4000))
Here is the probability density function for the observed and modeled precip. The general shape is captured by the model but there is some deviation at the higher end of values.
The goal of this notebook is to determine if there is a single value of hours of precipitation that can scale the LT model hourly rainfall intensity to yealry rainfall. The idea is to replace the spatial field of precip hours from TRMM with a value that can be used if no observations are available.
Multiple scalar values are tested to get a sense of how certain the value of hours can be. Nash-sutcliffe coefficient is used to check the yearly output against the mean of the observations from TRMM.
As you can see at the bottom, there is no single value that scales hourly precip intensity to yearly precip rate better than using the mean observed precip. In fact, 0 hours provides the best estimate. Odd I think
# Define your location. Most locations these strings will be the same but for South America, I use the same DEM (elev_SAmerica_albers.tif) for the tropical and extratropical andes despite optimizing separately.
# -- you'll need to decide on a wind direction for your loaction. W/SW for Andes. E for Africa. SW for himalaya
# -- you'll need to setup a place to save intermediate FFT so it doesn't need to be regenerated each iteration
elev_string='SAmerica'
location_string='extratropicalAndes'
winddir=45
#
Intdir <- file.path(RPROJHOME,'int')
if(!dir.exists(Intdir)) dir.create(Intdir)
Intfn <- file.path(Intdir,paste0('fft_',location_string,'.rds'))
#This function processes DEM and TRMM data to the same grid for your desired location. Variable assignments are made from the function to the global environment with <<-
suppressWarnings(rm(elev,trmm_hours_mtn,trmm_precip_mtn,mtn_mask))
preprocess_rasters(elev_string,location_string)
The median hours of precip is 436.5 hours.
# trmm_avghrs_mtn <- setValues(trmm_hours_mtn,avg_hrs) %>% mask(trmm_hours_mtn)
trmm_medhrs_mtn <- setValues(trmm_hours_mtn,med_hrs) %>% mask(trmm_hours_mtn)
#
trmm_hrs_list=list(trmm_hours_mtn,trmm_medhrs_mtn)
# Set up an array of hours to test.
hrs_vector=c(med_hrs,seq(0,500,100))
trmm_hrs_list <- lapply(hrs_vector,function(x) setValues(trmm_hours_mtn,x) %>% mask(trmm_hours_mtn))
paramdf <- cross_df(.l=list(bckgrd_precip=2,t_delay=800,windspeed=5,winddir=winddir,trunc_value=0,trmm_hours_raster=trmm_hrs_list))
skilldF <-
paramdf %>%
mutate(
hrsprecip=hrs_vector,
skill=pmap_dbl(.,LToptimize,elev_raster=elev,trmm_obs_raster=trmm_precip_mtn,mtn_mask=mtn_mask,NS,Intfn)
)
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
skilldF %>%
arrange(desc(skill))
The idea of this notebook is to test whether the hourly precip intensity can be scaled to a yearly rainfall without remote sensing.
Two different scaling methods are tested:
The nash-sutclife coefficient is used to test the modeled distribution to the mean of the observations.
In all cases, scaling the hourly intensity to a yearly precip distribution was worse than the mean observed precipition. In general, the 2 parameter scaling performed better than 1 parameter scaling.
This test was performed for several different domains with similar results. here is an example:
# Define your location. Most locations these strings will be the same but for South America, I use the same DEM (elev_SAmerica_albers.tif) for the tropical and extratropical andes despite optimizing separately.
# -- you'll need to decide on a wind direction for your loaction. W/SW for Andes. E for Africa. SW for himalaya
# -- you'll need to setup a place to save intermediate FFT so it doesn't need to be regenerated each iteration
elev_string='SAmerica'
location_string='extratropicalAndes'
winddir=45
#
Intdir <- file.path(RPROJHOME,'int')
if(!dir.exists(Intdir)) dir.create(Intdir)
Intfn <- file.path(Intdir,paste0('fft_',location_string,'.rds'))
#This function processes DEM and TRMM data to the same grid for your desired location. Variable assignments are made from the function to the global environment with <<-
suppressWarnings(rm(elev,trmm_hours_mtn,trmm_precip_mtn,mtn_mask))
preprocess_rasters(elev_string,location_string)
We scaled with an assumed min and max. Min is always 0 so really we are scaling the maximum value and applying some uncertainty. I scaled with the following function:
\[ \frac{(max_{model}-min_{model})(x-min_{assumed})}{max_{assumed}-min_{assumed}} + min_{model} \]
min_precip_trmm <- cellStats(trmm_precip_mtn,'min')
max_precip_trmm <- cellStats(trmm_precip_mtn,'max')
frac_vector <- seq(0.5,1.5,0.1)
assumed_max <- max_precip_trmm*frac_vector
assumed_min <- min_precip_trmm*frac_vector
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
I scaled with the mean and applied some uncertainty. This is in effect the same as the previous analysis where different scalar hours were tested. The mean scaling was performed by multiplying by the ratio of assumed mean and the modeled mean:
\[ x*\frac{mean_{assumed}}{mean_{model}} \]
# suppressWarnings(rm(elev,trmm_hours_mtn,trmm_precip_mtn,mtn_mask))
# preprocess_rasters(elev_string,location_string)
avg_precip_trmm <- cellStats(trmm_precip_mtn,'mean')
frac_vector <- seq(0.5,1.5,0.1)
assumed_avg <- avg_precip_trmm*frac_vector
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
## [1] "reading FFT from file"
This notebook examines the relationshp between the hours of precip from TRMM to some terrain variables including relief, slope, and aspect. The hours of precip is so far critical to converting hourly rainfall intensity to yearly rainfall with any success. This notebook trieds to find an alternative parameterization for the number of precipitation hours in each grid cell using some basic terrain variables. Since precipitation is orographically-enhanced, it might be that these orographically-driven processes also affect the duration of precip.
Relief is defined as the maximum elevation difference within a 3x3 pixel window. This is implemented in GDAL as ‘roughness’ (and calculated below using the raster::terrain(). With a 5km resolution DEM, the relief is at the 15km scale.
Slope and Aspect are computed in radians.
The machine learning algorithm randomForest is known for estimating non-linear relationships with minimal overfitting. This was used to see if there is a quantifable relationship between hours of precip and some basic terrain variables (relief, slope, aspect). Such a relationship would facilitate a parameterization of hours of precip to get over the hurdle of scaling hourly LT intensity to yearly rainfall rate.
The results of this analysis suggest that the relationship between hours of precip and these terrain variables is weak. I do not try to scale LT intensity to yearly rainfall with this parameterization because the explained variance is only 11%.
I chose to look at tropical Andes first:
We compute the rainfall intensity using a set of parameters from previous analyses.
LTintensity <- LTintensity(bckgrd_precip=2,t_delay=800,windspeed=5,winddir=winddir,trunc_value=0, Intfn=Intfn, elev_raster = elev, mtn_mask=mtn_mask)
Now we can look at the LT intensity, roughness, and hours of precip with pairplots.
Here are some pairplots of the variables. Only LT and relief show any potential for a linear relationship which is not unexpected and not particularly helpful.
In an effort to model hours from TRMM, I looked to see if relief, slope, and aspect could be combined nonlinearly in a randomForest model. I’m not using northness because that is just a linear combination of slope and aspect.
rowsample <- sample(seq_len(nrow(dF)), .3*nrow(dF))
dF_rf <-
dF %>%
mutate(trmmhours=scale(trmmhours),
relief=scale(relief),
slope=scale(slope),
aspect=scale(aspect))
hours_rf <-
randomForest::randomForest(
x=as.matrix(dF_rf[rowsample,c('relief','slope','aspect')]),
y=as.numeric(dF_rf[rowsample,]$trmmhours),
mtry=2,
na.action=na.rm
)
hours_rf
##
## Call:
## randomForest(x = as.matrix(dF_rf[rowsample, c("relief", "slope", "aspect")]), y = as.numeric(dF_rf[rowsample, ]$trmmhours), mtry = 2, na.action = na.rm)
## Type of random forest: regression
## Number of trees: 500
## No. of variables tried at each split: 2
##
## Mean of squared residuals: 0.8947818
## % Var explained: 10.49
The results are not very promsing with a pseudo-r**2 of only .11. A quick look at the observed vs predicted values below suggests mostly a scatter shot of predictive ability. It is difficult to generalize a model like this to any DEM so I won’t pursue this any further.
xtest=as.matrix(dF_rf[-rowsample,c('relief','slope','aspect')])
ytest=as.numeric(dF_rf[-rowsample,]$trmmhours)
cen=attr(dF_rf$trmmhours,which = 'scaled:center')
sca=attr(dF_rf$trmmhours,which='scaled:scale')
plot(ytest*sca+cen, predict(hours_rf,newdata=xtest)*sca+cen)
The goal of this notebook is to explore whether precipitation can be paramterized in swaths longitudinal to the wind. This approach is inspired by Bodo’s 2010 paper looking at the hydrologic budget in the Himalaya (see Figure 6) It makes sense because there could be substantial differences in weather patterns along a mountain range.
I created 100 km swaths oriented in the direction of the wind (specific to the domain). Swaths less than 100km wide are quite computation intensive and harder to visualize. For each swath, I averaged the variables across the swath (perpendicular to the wind) to be able to plot profiles for each swath as well as fit a non-linear statistical model for precip and precip hours.
For the statistical model I used a generalized additive model (GAM) because it allows you to see how the relationship between independent and dependent variables changes. Each model is fit on a random subset of the data and then tested on the other part of the data.
The statistical models performed better than the randomForest previously. Although these analyses are not directly comparable (different statistical models, different r2 formula), they suggest that looking at the swath profiles in the direction of wind has more potential for parameterizing precip hours or precip than broadly computing and scaling the LT model.
This outcome corresponds with my conversation with Bodo, i.e. relief and profile distance may be the best way to paramterize precip. Figure 6 in Bookhagen and Burbank, 2010 also points to this conclusion.
I ran out of time to continue this, but I think it would be worthwhile to compare models for different domains to see how the parameter coefficients are similar (or not).
I’ve primarily done this in the tropical Andes.
# Define your location. Most locations these strings will be the same but for South America, I use the same DEM for the tropical and extratropical andes despite optimizing separately.
# -- you'll need to decide on a wind direction for your loaction. W/SW for Andes. E for Africa. SW for himalaya
# -- you'll need to setup a place to save intermediate FFT so it doesn't need to be regenerated each iteration
elev_string='SAmerica'
location_string='tropicalAndes'
winddir=45
#
Intdir <- file.path(RPROJHOME,'int')
if(!dir.exists(Intdir)) dir.create(Intdir)
Intfn <- file.path(Intdir,paste0('fft_',location_string,'.rds'))
#This function processes DEM and TRMM data to the same grid for your desired location. Variable assignments are made from the function to the global environment with <<-
suppressWarnings(rm(elev,trmm_hours,trmm_precip))
preprocess_rasters(elev_string,location_string)
roughness=terrain(elev,opt='roughness')
Choose a swath width. Bodo uses 50km (I think) but the smaller they are then the longer the analysis takes.
TODO:
The mean should really be taken perpendicular to the swath instead of vertically along the y axis.
These profiles should be shifted so that x=0 on the graph corresponds to the highest point in the swath. It would be easier to view but my analytica abilities are failing me at the moment.
swath_width=300000
Let’s view the distribution of precip and the hours of precip across all swaths and at all distances along the profile. Both are reasonable normally distributed except 0 truncation of course.
hist(swath_spread$trmm_precip)
hist(swath_spread$trmm_hours)
We can also fit a generalized additive model to predict different variables. By fitting these models to swath averages you’ve hopefully reduced some of the small scale spatial variability. If you use the by= argument inside s() you can view the coefficients as they change for different values of the variables. Dividing by the mean let’s you compare coefficients for variables of different magnitudes.
Both relief and elevation increase in impact as the values get higher but relief influences the result an order of 2 greater than elevation. Where you are in the profile also has an important influence but it’s relatively consistent. I think the oscillations are because the profiles start at the raster edge and the mountain range curves. As mentioned earlier, you should shift the profiles so that x=0 is at the highest elevation. I think x may represent the depletion of moisture from the air mass as it is pushed over the mountain range.
gam_precip <- gam(trmm_precip~s(elev,by=elev)+s(relief,by=relief)+s(x,by=x),data=swath_spread[rowsample,], link=gaussian())
extract_splines=function(mdl){
sterms=predict(mdl,type='terms')
datplot=cbind(sterms,mdl$model) %>% tbl_df
datplot$intercept=attr(sterms,'constant')
datplot$yhat=rowSums(sterms)+attr(sterms,'constant')
return(datplot)
}
pseudoR2 <- function(yobs,yhat){
mse <- mean((yhat-yobs)^2, na.rm=T)
1-mse/var(yobs)
}
yhat=predict(gam_precip, swath_spread[-rowsample,], type='response')
r2=pseudoR2(swath_spread[-rowsample,]$trmm_precip,yhat)
The pseudo r2 for this model is 0.3637268.
termdF=extract_splines(gam_precip)
ggplot(termdF)+
geom_line(aes(x=relief,y=`s(relief):relief`/mean(relief)))
ggplot(termdF)+
geom_line(aes(x=elev,y=`s(elev):elev`/mean(elev)))
ggplot(termdF)+
geom_line(aes(x=x,y=`s(x):x`/mean(x)))
Some diagnostic plots. Overall the model is fairly well behaved with gaussian residuals and a linear response-fitted relationship.
gam.check(gam_precip)
##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 21 iterations.
## The RMS GCV score gradient at convergence was 0.1226032 .
## The Hessian was positive definite.
## Model rank = 31 / 31
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## s(elev):elev 10.00 9.86 0.99 0.28
## s(relief):relief 10.00 9.91 0.90 <2e-16 ***
## s(x):x 10.00 9.99 0.97 0.09 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We again see relief to have the biggest influence by an order of 2-4 compared to elevation, but the standardized coefficients are relatively consistent across all ranges of relief. Elevation coefficients have large oscillations that I’m not sure how to interpret. Possibly also related to the profile shifting?
gam_hours <- gam(trmm_hours~s(elev,by=elev)+s(relief, by=relief)+s(x, by=x),data=swath_spread, link=gaussian())
yhat=predict(gam_hours, swath_spread[-rowsample,], type='response')
r2=pseudoR2(swath_spread[-rowsample,]$trmm_hours,yhat)
The pseudo r2 for this model is 0.3011913.
termdF=extract_splines(gam_hours)
ggplot(termdF)+
geom_line(aes(x=relief,y=`s(relief):relief`/mean(relief)))
ggplot(termdF)+
geom_line(aes(x=elev,y=`s(elev):elev`/mean(elev)))
ggplot(termdF)+
geom_line(aes(x=x,y=`s(x):x`/mean(x)))
More diagnostics. residuals look good, qq plot looks really good.
gam.check(gam_hours)
##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 18 iterations.
## The RMS GCV score gradient at convergence was 0.02331009 .
## The Hessian was positive definite.
## Model rank = 31 / 31
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## s(elev):elev 10.00 9.98 1.00 0.47
## s(relief):relief 10.00 9.92 0.86 <2e-16 ***
## s(x):x 10.00 10.00 0.76 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
These analyses suggest that scaling the LTmodel (mm/hr) to real world precipitation (mm/yr) works quite well if you have a spatial map of the hours of precipitation per year. The model is not very sensitive to its parameters, presumably because the spatial heterogeneity of the precip hours has a larger impact. It is not trivial to scale the hourly intensity to a yearly rainfall rate without remote sensing because each grid cell evidently scales differently. Using the observed hours of precipitation, the modeled yearly precip can very closely approximate the observed yearly mean precip and distribution of precip (Analyses 1, 2). Importantly, you can not just scale the modeled distribution to have the same mean or min/max as the observed distribution and expect a realistic distribution (Analyses 3, 4). It is currently not feasible to use the LT model for simulating precip in a landscape evolution model because the spatially variable scaling is required to convert the precip intensity (mm/hr) to mm/yr.
One solution might be to parameterize the hours of precip using topographic features, but my initial analyses for this did not yield great results for an entire domain (Analysis 5). Evaluating the precipitation in 100km-500km swaths parallel to the wind direction provides the most promise for parameterizing precipitation (Analysis 6). This analysis should be extended to other regions to test parameter stability of a statistical model.
Alpert, P. 1986. “Mesoscale Indexing of the Distribution of Orographic Precipitation over High Mountains.” Journal of Climate and Applied Meteorology 25 (4). American Meteorological Society: 532–45. doi:10.1175/1520-0450(1986)025<0532:MIOTDO>2.0.CO;2.
Anders, Alison M, Gerard H Roe, Dale R Durran, and Justin R Minder. 2007. “Small-Scale Spatial Gradients in Climatological Precipitation on the Olympic Peninsula.” Journal of Hydrometeorology 8 (5). American Meteorological Society: 1068–81. doi:10.1175/JHM610.1.
Anders, Alison M, Gerard H Roe, Bernard Hallet, David R Montgomery, Noah J Finnegan, and Jaakko Putkonen. 2006. “Spatial patterns of precipitation and topography in the Himalaya.” In Special Paper 398: Tectonics, Climate, and Landscape Evolution, 398:39–53. Geological Society of America. doi:10.1130/2006.2398(03).
Bookhagen, B., and Douglas W Burbank. 2010. “Toward a complete Himalayan hydrological budget: Spatiotemporal distribution of snowmelt and rainfall and their impact on river discharge.” Journal of Geophysical Research 115 (F3): F03019. doi:10.1029/2009JF001426.
Burbank, D. W., A. E. Blythe, J. Putkonen, B. Pratt-Sitaula, E. Gabet, M. Oskin, A. Barros, and T. P. Ojha. 2003. “Decoupling of erosion and precipitation in the Himalayas.” Nature 426 (6967): 652–55. doi:10.1038/nature02187.
Dadson, Simon J., Niels Hovius, Hongey Chen, W. Brian Dade, Meng-Long Hsieh, Sean D. Willett, Jyr-Ching Hu, et al. 2003. “Links between erosion, runoff variability and seismicity in the Taiwan orogen.” Nature 426 (6967): 648–51. doi:10.1038/nature02150.
Han, Jianwei, Nicole M Gasparini, and Joel P L Johnson. 2015. “Measuring the imprint of orographic rainfall gradients on the morphology of steady-state numerical fluvial landscapes.” Earth Surface Processes and Landforms 40 (10): 1334–50. doi:10.1002/esp.3723.
Molnar, Peter. 2003. “Nature, nurture and landscape.” Nature 426 (December): 11–13. doi:10.1038/426612a.
Reiners, Peter W, Todd A Ehlers, Sara G Mitchell, and David R Montgomery. 2003. “Coupled spatial variations in precipitation and long-term erosion rates across the Washington Cascades.” Nature 426 (6967): 645–47. doi:10.1038/nature02111.
Smith, Ronald B., and Idar Barstad. 2004. “A Linear Theory of Orographic Precipitation.” Journal of the Atmospheric Sciences 61 (12): 1377–91. doi:10.1175/1520-0469(2004)061<1377:ALTOOP>2.0.CO;2.